NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

K-means and hierarchical clustering of f0 contours

https://doi.org/10.21437/Interspeech.2024-181

Kaland, Constantijn; Steffman, Jeremy; Cole, Jennifer (September 2024, International Speech Communication Association)

Cluster analysis on time-series f0 data is an increasingly popular method in intonation research. There are a number of methodological decisions to take when applying cluster analysis. Crucially, these decisions may affect the clustering results, potentially also the conclusions of the research. This paper investigates the extent to which the choice for either K-means or hierarchical clustering, two of the most popular clustering methods, leads to grouping differences that are potentially relevant for intonation research. This is tested using a dataset of f0 measures taken from imitated intonation patterns in American English. The analysis concerns a generic correlation test between K-means and hierarchical clustering outcomes as well as a number of specific measures assessing partitioning quality and f0 contour differences. The results show that both cluster methods generally show very similar outcomes, although considerable differences for specific clusterings might occur.
more » « less
Full Text Available
Functional modeling of F0 variation across speakers and between phonological categories: Rising pitch accents in American English

https://doi.org/10.21437/SpeechProsody.2024-206

Cole, Jennifer; Steffman, Jeremy; Awwad, Aya (July 2024, ISCA)

The Autosegmental-Metrical model of American English distinguishes three pitch accents with rising F0 trajectories (H*, L+H*, L*+H), differing in peak alignment and presence vs. absence of a low pitch marking the rise onset. Empirical studies report additional distinctions in the dynamics and scaling of the F0 rise, raising the question of which properties best capture variation among accents. We use functional principal components analysis (FPCA) to examine dynamic properties of accentual F0 trajectories in data from an intonation imitation experiment. F0 trajectories from 70 speakers producing rising accents on the phrase-final (nuclear) accented word were submitted to FPCA. The first three PCs account for 95% of variation in F0 trajectories and each shows significant differences between the three rising accents. Variation in PC1 primarily relates to differences in the overall F0 level of the trajectory, PC2 captures differences in rise shape (scooped vs. domed rise) and PC3 captures fine variation from a following Low phrase accent. Alignment distinctions are distributed across all three PCs. Examination of individual speakers shows all use PC1 and PC2 to some degree to distinguish rising accents, with no trading relations. Rises are variously implemented through level or shape distinctions, to varying degrees across individuals
more » « less
Full Text Available
Intonational categories and continua in American English rising nuclear tunes

https://doi.org/10.1016/j.wocn.2024.101310

Steffman, Jeremy; Cole, Jennifer; Shattuck-Hufnagel, Stefanie (May 2024, Journal of Phonetics)

Full Text Available
Metrical enhancement in American English nuclear tunes

https://doi.org/10.16995/glossa.15297

Steffman, Jeremy; Cole, Jennifer (January 2024, Glossa: a journal of general linguistics)

We present two experiments aimed at testing the nature of intonational categories through the lens of enhancement. In an imitative speech production paradigm, speakers heard a model intonational tune and were prompted to reproduce that tune on a new sentence in which the syllable count of the word carrying the tune varied. Using the prevalent auto-segmental metrical model of American English as a basis for potential tune categories, we test how distinctions among tunes are enhanced across different metrical structures. First, with a clustering analysis, we find that not all predicted distinctions are emergent. Secondly, only the largest distinctions, those that emerge in the clustering analysis, are enhanced as a function of metrical structure. Measurable differences between tunes which cluster together are detectable, but critically, are not enhanced. We discuss what these results mean for the nature and number of intonational categories in the system.
more » « less
Full Text Available
Short-term exposure alters adult listeners' perception of segmental phonotactics

https://doi.org/10.1121/10.0023900

Steffman, Jeremy; Sundara, Megha (December 2023, JASA Express Letters)

This study evaluates the malleability of adults' perception of probabilistic phonotactic (biphone) probabilities, building on a body of literature on statistical phonotactic learning. It was first replicated that listeners categorize phonetic continua as sounds that create higher-probability sequences in their native language. Listeners were also exposed to skewed distributions of biphone contexts, which resulted in the enhancement or reversal of these effects. Thus, listeners dynamically update biphone probabilities (BPs) and bring this to bear on perception of ambiguous acoustic information. These effects can override long-term BP effects rooted in native language experience.
more » « less
Disentangling the Role of Biphone Probability From Neighborhood Density in the Perception of Nonwords

https://doi.org/10.1177/00238309231164982

Steffman, Jeremy; Sundara, Megha (May 2023, Language and Speech)

In six experiments we explored how biphone probability and lexical neighborhood density influence listeners’ categorization of vowels embedded in nonword sequences. We found independent effects of each. Listeners shifted categorization of a phonetic continuum to create a higher probability sequence, even when neighborhood density was controlled. Similarly, listeners shifted categorization to create a nonword from a denser neighborhood, even when biphone probability was controlled. Next, using a visual world eye-tracking task, we determined that biphone probability information is used rapidly by listeners in perception. In contrast, task complexity and irrelevant variability in the stimuli interfere with neighborhood density effects. These results support a model in which both biphone probability and neighborhood density independently affect word recognition, but only biphone probability effects are observed early in processing.
more » « less
Full Text Available
An automated method for detecting F0 measurement jumps based on sample-to-sample differences

https://doi.org/10.1121/10.0015045

Steffman, Jeremy; Cole, Jennifer (November 2022, JASA Express Letters)

An algorithm for detecting sudden jumps in measured F0, which are likely to be inaccurate measures, is introduced. The method computes sample-to-sample differences in F0 and, based on a user-defined threshold, determines whether a difference is larger than naturally produced F0 velocities, thus, flagging it as an error. Various parameter settings are evaluated on a corpus of 30 American English speakers producing different intonational patterns, for which F0 tracking errors were manually checked. The paper concludes in recommending settings for the algorithm and ways in which it can be used to facilitate analyses of F0 in speech research.
more » « less
Full Text Available
Hierarchical distinctions in the production and perception of nuclear tunes in American English

https://doi.org/10.16995/labphon.9437

Cole, Jennifer; Steffman, Jeremy; Shattuck-Hufnagel, Stefanie; Tilsen, Sam (January 2023, Laboratory Phonology)

In Autosegmental-Metrical models of intonational phonology, different types of pitch accents, phrase accents, and boundary tones concatenate to create a set of phonologically distinct phrase-final nuclear tunes. This study asks if an eight-way distinction in nuclear tune shape in American English, predicted from the combination of two (monotonal) pitch accents, two phrase accents, and two boundary tones, is evident in speech production and in speech perception. F0 trajectories from a large-scale imitative speech production experiment were analyzed using bottom-up(k-means) clustering, neural net classification, GAMM modeling, and modeling of turning point alignment. Listeners’ perception of the same tunes is tested in a perceptual discrimination task and related to the imitation results. Emergent grouping of tunes in the clustering analysis, and related classification accuracy from the neural net, show a merging of some of the predicted distinctions among tunes whereby tune shapes that vary primarily in the scaling of final f0 are not reliably distinguished. Within five emergent clusters, subtler distinctions among tunes are evident in GAMMs and f0 turning point modeling. Clustering of individual participants’ production data shows a range of partitions of the data, with nearly all participants making a primary distinction between a class of High-Rising and Non-High-Rising tunes, and with up to four secondary distinctions among the non-Rising class. Perception results show a similar pattern, with poor pairwise discrimination for tunes that differ primarily, but by a small degree, in final f0, and highly accurate discrimination when just one member of a pair is in the High-Rising tune class. Together, the results suggest a hierarchy of distinctiveness among nuclear tunes, with a robust distinction based on holistic tune shape and poorly differentiated distinctions between tunes with the same holistic shape but small differences in final f0. The observed distinctions from clustering, classification, and perception analyses align with the tonal specification of a binary pitch accent contrast {H*, L*} and a maximally ternary {H%, M%, L%} boundary tone contrast; the findings do not support distinct tonal specifications for the phrase accent and boundary tone from the AM model.
more » « less
Full Text Available
Shape matters: Machine classification and listeners’ perceptual discrimination of American English intonational tunes

https://doi.org/10.21437/SpeechProsody.2022-61

Cole, Jennifer; Steffman, Jeremy; Tilsen, Sam (May 2022, SpeechProsody)

Full Text Available
The rise and fall of American English pitch accents: Evidence from an imitation study of rising nuclear tunes

https://doi.org/10.21437/SpeechProsody.2022-174

Steffman, Jeremy; Shattuck-Hufnagel, Stefanie; Cole, Jennifer (May 2022, SpeechProsody)

Full Text Available

« Prev Next »

Search for: All records